Tushare Financial Data Interface
# Stock Fundamental Statistics
Use the get_stock_basics() function to download all stock fundamental data at once. This is useful for looking at the overall market situation of a stock.
import tushare as ts
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
stock = ts.get_stock_basics() # Download Stock Fundamental Data
stock.to_excel('stock.xlsx') # Save as spreadsheet
stock.shape # Out: (3678, 22)
The size of the dataset is 3823x22 and each row is the basic data for one stock. See Tushare's website for field details. See the http://tushare.org website for details of the data set fields. The data columns used in this section are: code, stock code; name, name; industry, industry; area, region; pe, price-to-earnings ratio; totals, total equity (RMB 100 million); esp, earnings per share; timeToMarket, date of listing.
The data is read from the spreadsheet file below, noting the details of the handling of the stock code column. pandas always tries to automatically convert the data to a numeric type when reading it. If a stock code like '002522' is read in for the Shenzhen market, the leading character '00' will be lost and it will become the integer 2522, so the code field is deliberately specified as a string when reading.
df = pd.read_excel('stock.xlsx', dtype={'code': 'str'}) # code string type
df.set_index('code', inplace=True) # Set code as index column
df.loc['002522'] # Showing the fundamentals of a stock
len(df.industry.unique()) # Show industry numbers
len(df.area.unique()) # Showing the number of regions (i.e. the provinces to which the shares belong)
# Number of listed companies by region, reflecting regional economic strength
df.groupby('area').size().sort_values(ascending=False)
As can be seen from the above statistics, the more economically developed and dynamic the region, the greater the number of listed companies. The reader can also perform similar statistics by industry. The timeToMarket field in the data box represents the date of listing and is an integer in the format of "20190315". We can extract the year from it to count the number of shares issued each year.
year = df.timeToMarket.astype('str').str[:4] # Convert to a string and extract the first 4 digits of the year
yearnum = df.groupby(year).size() # Statistics by year to obtain the number of shares issued per year
yearnum
plt.rcParams['font.sans-serif'] = ['SimHei'] # Specify Chinese bold font
# False below fixes a problem with the negative '-' sign on the axis being displayed as a square
plt.rcParams['axes.unicode_minus'] = False
# There are a few stocks in the dataset that do not have a year of issue (year 0), exclude year 0 from the graph
yearnum[yearnum.index!='0'].plot(fontsize=14, title='年IPO数量')
从图中It can be seen that several highs in the year of IPO issuance correspond to several bull market times in the domestic stock market, with the number of issuances falling to a low during bear markets. The following calculates the market's average price-to-earnings ratio, pe, which is an important parameter in measuring stock market valuation.
df.pe.mean() # Simple arithmetic average pe
Looking at the dataset reveals that the pe of loss-making stocks in the dataset is 0. Therefore, the removal of loss-making stocks is considered.
df[df.pe > 0].pe.mean() # Calculating pe averages after excluding loss-making stocks
The pe above is a simple arithmetic average, a weighted pe with market capitalisation as the weighting may be a more accurate reflection of market conditions. As the total market capitalisation and stock unit prices are not available in the downloaded dataset, the total market capitalisation can only be extrapolated from the available fields. It is also common in data processing to calculate new column values from the values of certain columns. Here the total market capitalisation is extrapolated on the basis of
Unit price of stock = 4esp (earnings per share) pe (price-to-earnings ratio)
Total market capitalisation = share unit price *totals total equity (RMB billion)
The earnings per share esp in the dataset is for a single quarter, so multiply the full year earnings by 4.
df['tvalue'] = 4 * df.esp * df.pe * df.totals # Calculate total market value, add new column tvalue
np.sum(df.pe * df.tvalue) / df.tvalue.sum() # Calculation of weighted pe with market capitalisation as the weighting
The above calculation reflects the market-weighted pe after a particular quarterly report and the result differs from the true market value. This is because stocks have different returns each quarter and therefore you cannot simply calculate the full year return on a "4*single quarter return" basis.
China's stock market is now divided into Shanghai (stock code beginning with 60), Shenzhen Main Board (stock code beginning with 00), GEM (stock code beginning with 30) and the newly listed STB (stock code beginning with 68). The following codes can be used to calculate the pe value and the number of stocks in different sectors.
df['board'] = df.index.str[:2] # take first 2 characters of code, add new board column
# count pe averages by board type, count
df.groupby('board').pe.agg([('pe均值', 'mean'), ('股票数', 'count')])